Semi-supervised learning for text classification using feature affinity regularization

نویسندگان

  • Bin Zhang
  • Mari Ostendorf
چکیده

Most conventional semi-supervised learning methods attempt to directly include unlabeled data into training objectives. This paper presents an alternative approach that learns feature affinity information from unlabeled data, which is incorporated into the training objective as regularization of a maximum entropy model. The regularization favors models for which correlated features have similar weights. The method is evaluated in text classification, where feature affinity can be computed from feature co-occurrences in unlabeled data. Experimental results show that this method consistently outperforms baseline methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Learning of Naive Bayes Classifier with feature constraints

Semi-supervised learning methods address the problem of building classifiers when labeled data is scarce. Text classification is often augmented by rich set of labeled features representing a particular class. As tuple level labling is resource consuming, semi-supervised and weakly supervised learning methods are explored recently. Compared to labeling data instances (documents), feature labeli...

متن کامل

On Propagated Scoring for Semi-supervised Additive Models

In this paper, a semi-supervised modeling framework that combines feature-based (x) data and graph-based (G) data for classification/regression of the response Y is presented. In this semi-supervised setting, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). The Propagated Scoring algorithm proposed for fitting this model is a semi-supervised fi...

متن کامل

Semi-supervised Collaborative Text Classification

Most text categorization methods require text content of documents that is often difficult to obtain. We consider “Collaborative Text Categorization”, where each document is represented by the feedback from a large number of users. Our study focuses on the semisupervised case in which one key challenge is that a significant number of users have not rated any labeled document. To address this pr...

متن کامل

Extension of TSVM to Multi-Class and Hierarchical Text Classification Problems With General Losses

Transductive SVM (TSVM) is a well known semi-supervised large margin learning method for binary text classification. In this paper we extend this method to multi-class and hierarchical classification problems. We point out that the determination of labels of unlabeled examples with fixed classifier weights is a linear programming problem. We devise an efficient technique for solving it. The met...

متن کامل

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

We describe a computationally efficient, stochastic graph-regularization technique that can be utilized for the semi-supervised training of deep neural networks in a parallel or distributed setting. We utilize a technique, first described in [13] for the construction of mini-batches for stochastic gradient descent (SGD) based on synthesized partitions of an affinity graph that are consistent wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012